home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Speccy ClassiX 1998
/
Speccy ClassiX 98.iso
/
amiga_system
/
the_aminet
/
dev
/
debug
/
apurify_v1_2.lha
/
doc
/
MOT-APurify.doc
< prev
Wrap
Text File
|
1995-09-06
|
29KB
|
621 lines
MOT-APurify v1.2.1
------------------
MOTOROLA-syntax version.
(c) by Samuel DEVULDER
Sept. 1995
Samuel.Devulder@info.unicaen.fr
DESCRIPTION (SHORT):
--------------------
This is APurify for compilers with MOTOROLA syntax asm-files. As far
as I all compilers exept GCC uses such a syntax. If you're using the
GCC compiler, then read MIT-APurify.doc instead. That version is rather
a version for the DICE compiler, but I think it can work with
other compilers. In the following of that document, APurify stands
for MOT-APurify, and I assume you're using the DICE compiler.
APurify is a program that allows you to detect bad accesses to memory
of your programs without any kind of specific external devices (MMU).
It avoids bugs due to accessing memory not owned by your program.
This version is based on the MIT-syntax version. This is a small
improvement of APurify v1.1 on Aminet/dev/debug. It may be full of
bugs, so be carefull !
INSTALLATION:
------------
That archive contains the version of APurify for the DICE compiler
as well for other compilers. Here is a description of DICE-related
files of this archive for that version. It also gives you what to do
with those files to make an installation.
- doc/MOT-APurify.doc The file you are currently reading. Put it
with all your doc files. It is usefull from
time to time.
- doc/History The whole history. (this file is not very
usefull for common people). Do whatever you
want with it.
- bin/MOT-APurify The parser tuned for the MOTOROLA syntax.
Rename it as APurify and put it someware in
your path. That program can be used with
any compiler that outputs MOTOROLA syntax
(ie. all compilers except GCC).
- lib/APur-dice.lib The DICE link-time library. Rename it as
APur.lib and put it someware in your
library search-path if you are using the
DICE compiler. It may work well for
other compilers (it is a COMMODORE-format
library). If that library is not good for
you (generating undefined labels or else),
then try APur-pdc.lib. If it fails too,
then please contact me and I'll try to
include a specific version of that library
for that compiler in a future release.
- lib/APur-pdc.dir The PDC link-time library. Rename it as
- lib/APur-pdc.lib APur.lib and put it someware in your
library search-path if you are using the
PDC compiler. It may work well for other
compilers (it is a COMMODORE-format
library). If that library is not good for
you (generating undefined labels or else),
please contact me and I'll try to include a
specific version of that library for that
compiler in a future release.
- test/test.c Source of a stupid test file. Just here to
let you remake the test program. Do
whatever you want with it.
- test/test.dice Test file Apurify'ed. Run it to see how
APurify is useful :-). (dice generated
file)
- test/test.pdc Test file Apurify'ed. Run it to see how
APurify is useful :-). (pdc generated file)
SYNOPSIS:
--------
Usage: APurify [-revinfo] [flags] <inputfile> [-o <outputfile>]
Where flags can be:
-br<Ax> To set the base register
-tb To test memory referenced through base register
-ts To test memory referenced through stack register
-tl To test memory referenced through local stack frame
-tp To test pea instructions
-?,?,-h To display this usage
Flags can be anywhere on the command line and may be merged together.
But take care that flags that need an extra argument appear in the last
position. Thus "-tsoPROG.s" is good and will output a file called
"PROG.s" while "-otsPROG.s" is wrong and will output a file called
"tsPROG.s" ! Here is a short description of arguments and flags:
-revinfo This displays informations about APurify (name, size and
date of modules and number of compilation done for that
version).
-br<Ax> This sets the base register used to reference memory
in SMALL_DATA model. Usually A4 is used for that perpose
and that's the default. If A5 is used instead then add
-brA5 on your command line.
-tb This enable APurify to check all referenced memory through
the base register (see -br). If you are using a SMALL_DATA
model, add this flag on your command line. By default,
APurify won't check memory referenced through the base
register.
NOTE: for safest check, you should always use that option,
even if you're not in smalldata model (A4 may be used as
a temporary register in that case).
-ts This enable APurify to check memory referenced by stack
pointer (SP or A7). By default APurify won't check such
memory accesses (to reduce the code size and increase the
runtime speed). That option will detect when you have no
more room on your stack (stack overflow).
-tl This enable APurify to check memory referenced by local
stack pointer (the one that is link'ed and unlink'ed when
enterring and exiting a C-function). By default, this is
switch off. This option allow APurify to detect stack
overflow.
-tp This enable APurify to check indirect adresses pushed onto
the stack by using a pea. By default this is off. When
used, that option will check things like "pea a2@(10)" or
the like. This can help you with memory accessed by a
pointer in a code that has not been APurify'ed. For example
this is usefull for things like fread(&ptr[10],10,1,fp)
because in that case the "pea a2@(10)" used to push on the
stack &ptr[10] will be checked and if ptr[10] is not owned
by your program, you'll get an APurify error. Please note
that this may no work all the time since &ptr[0] can be
translated as "movel a0,sp@-" which won't be checked.
-o <outputfile>
This specifies the name of the outputfile. If ommited the
outputfile will be the same as the inputfile (source file).
-?
-h
? Obvious options.
DESCRIPTION (A BIT LONGER):
--------------------------
As a general rule, at the microprocessor level, there is two kind
of ways to access memory. There is direct access and indirect access to
memory. For example, in C, direct access can be viewed as accessing to
global variables. Indirect access corresponds to accessing an array
value. More precisely, direct access corresponds to reading or writing
a variable whose address is known at compilation time (or since the
loading of the program into the memory). Indirect access is used for
variables whose adress is dynamicaly determined by the program. For
example, if p is a pointer to an array allocated by malloc(), *p is an
indirect access. Such an access occur also in case of instruction like
T[i] where T is a global array, because the address of T[i] is not
known at compilation time, since it depends on the index value i. Using
indirect access to memory is called indirection.
A regular program must not access memory not owned by it. That kind
of access can be qualified as illegal.
Illegal direct access to memory is not possible, because by
definition, only global variables can be accessed that way and those
variables belongs obviously to the program (except for code written in
assembly language that references absolute values, for example:
"btst #6,$bfe001"; but that kind of code is not a good programming
:-)). So we can assume that direct access to memory is always right.
On the other hand, it is sure that indirect access to memory can
be illegal. Many bugs are made by overstepping array boundaries. If
that oversteppings are in reading a value, there is not much trouble
for over running tasks (it is an error inside your task); but if it is
in writing you may directly interfere with other tasks and big mess can
happen (total breakdown of the system).
APurify works on that kind of access by verifying the validity of
indirect access to memory. It remebers the memory that was allocated by
the program and check the integrity of each access. One can think that
makes a lot of tests ! Well, yes, but APurify is not designed to be
used in the general use of programs; just in test phases. Moreover,
indirections do no occur very often actually. Only array-based
variables produces indirections. Thus, the variables on the stack
--although being accessed by indirection-- are not checked because
their access is always safe (at least if there is no stack overflow !).
Also, in SMALL_DATA model, global variables access is done through
indirection, but they are not checked.
If an illegal access is found, APurify displays an error message on
the error stream of the program (have a look at the full justification
of the output when using verbose mode :^). There is two kind of illegal
accesses. Some are accesses to memory that doesn't belong to the
program (it is called an access between blocks), some others are
accesses to a part of memory owned by a program and an other part not
owned by it (it is an overstepping of a block). You can see this
visually: If [ 1 ] and [ 2 ] represent two blocks allocated by the
program and ( 3 ) the memory accessed, then
---- [ 1 ] ---- ( 3 ) ---- [ 2 ] ---->
0 increasing address
corresponds to the first kind of illegal access and
---- [ 1 ( ] 3 ) ---- [ 2 ] ----->
or
---- [ 1 ] ---- ( 3 [ ) 2 ] ----->
corresonds to the second kind of access. The first kind is very common
but the second is quite rare (it's rather a misaligment problem).
APurify has two output modes. One is verbose an tries to give lot
of informations by using words. The other one is more brief and gives
you the same informations but you'll have to decode them.
When APurify starts and ends, it outputs the date/time. This is
useful if you are using logfiles. With that, you can keep all your logs
in a single file and retrieve any execution with it's date of
execution.
In case of an error, APurify displays some text. The first line
looks like this one:
**** APURIFY ERROR ! [$<N1>(<N2>) <ATTR> (<TEXT1>)] <TEXT2>:
That line represent the accessed memory. <N1> is the hexadecimal
address accessed. <N2> is the length of the access (in decimal). <ATTR>
represents the type of acess. <TEXT1> allows you to find where in your
code the illegal accessed had happened. <TEXT2> describe the kind of
illegal access.
If the length (<N1>) is 1, then it was a byte access. 2 stands for
a short access, 4 for a int/long and >4 for movem instruction.
Attributes, <ATTR>, can be "R--" or "-W-". The first one represents an
access in reading a value and the second an access in writing a value.
The text <TEXT1> look like this:
<NAME>, PC=$<PC#> HUNK=$<HUNK#> OFFSET=$<OFF#>
<NAME> is the name of the subroutine where the error occured. It is
always displayed (even if it is a "static" one). The rest of the line
can be partially displayed, showing as much informations as APurify can
get. <PC#> is a hexadecimal address pointing to the instruction that
produced the error. <HUNK#> and <OFF#> are the hunk number and the
relative offset of <PC#>. Using <HUNK#> and <OFF#> and a disassembler,
you can very easilly find where your code is bad (BTW, I use dobj from
netdcc, (c) by Matt Dillon). Please note that <PC#> can point some
instruction before the faultly one. In that case, it will point to a
PEA followed by a JSR. As those instructions does not belong to your
code (they are APurify stuff), the involved instruction is the third
one. That will happen only if an instruction references memory two
times and if the first access is wrong. It is a little bit annoying but
it is better than nothing and it is quite rare :-).
The remaining lines show the context of the illegal access. It
gives you informations about the surronding memory blocks owned by
your program. Each block is displayed according to the following
pattern:
[$<N1>(<N2>) <ATTR> (<TEXT>)]
where <N1> is the hexadecimal address of the beginning of the block,
<N2> its length (in decimal). Note that the length may seem to be
longer than the one allocated by malloc() and the address may point
before the one you obtained via malloc(). This is not wrong ! In fact
you must know that the malloc() subroutine may add some informations
(like an double-chained list or the length of the allocation) to the
block you've requested. Those extra informations are put before the
address you recieve. That explain this behavior. In this version of
APur-dice.lib, this takes 8 extra bytes. So if you allocate 10 bytes,
don't be suprised if APurify thinks you've requested 18 bytes.
<ATTR> are 3 status characters RWS
where R means: read-enable block
W means: write-enable block
S means: system block (block not controlled by the program).
If one access is forbidden, the letter '-' replaces the corresponding
character. <TEXT> is actually the name of the procedure that has
allocated the block. If it ends with "*" that block was allocated by a
call to a subroutine not parsed by APurify during the execution of the
one indicated (a library call, maybe).
With each block you can find an offset. That offset is the distance
between that block and the faultly address. In verbose mode, you can
see some text explaining things about the relative position of a block
and the accessed memory. In non-verbose mode you can just see the
offsets followed by the blocks. The shorter offset is displayed first
since that block is the one that is more likely overstepped.
When an illegal writing occur (the only dangerous thing you can do
by indirection, indeed), APurify tells you to that error is really
dangerous and asks if you wish to stop your program. If you wish so,
exit() is called. You can also ignore that error or ignore all such
errors (but then you'll surely meet the guru !).
APurify checks the memory allocated but not freed by the program.
(in fact, it detects non deallocated-blocks on library-closing time).
It knows about memory location independant of the program
execution. That is to say, the first kilobyte of memory that contains
interrupt vectors of the 680x0 processor, the program segments and the
stack. Accessing to those blocks will not be illegal. They got the S
attribute (for SYSTEM blocks).
It takes into account memory block allocated by malloc() and
AllocMem(), and indirect allocated block (by OpenScreen() for example).
But I did not test the last kind of allocation. Anyway, it should be
ok, because APurify patches AllocMem() & FreeMem() entries. Thus a
program can access to the bitplanes of one of its screen without error.
If the program makes a legal access, but attributes are
incompatible with the access-kind, a protection-error message is
displayed. Actually only the first kilobyte is read/write-protected.
But it may change in the future.
In order to speed up block searching, APurify uses a cache of
recently accessed blocks. Thus, even if there is a large amount of
memory blocks, execution should not be slowed down too much. (but I
must say I doubt it is efficient enough).
HOW TO USE APURIFY:
------------------
One can see APurify as a pre-assembler. It must be used on assembly
language sourcefile just before the assembler takes place. It scan the
file and change it a bit so that APur.lib can be used.
Normal way to use it for a C program is to:
- compile C sourcefiles and leave assembly language source (.a).
- use APurify on each .a file.
- compile your .a file to get a .o file
- link all .o files together with APur.lib.
For example, using dcc (DICE) on prog.c that gives
CLI> dcc -a prog.c -o prog.a
CLI> APurify -tb prog.a
CLI> dcc -s prog.a -o prog -lAPur
As you can see, APurify needs no change to your C files to be used.
However, the library must be opened by calling AP_Init() in the main()
function. Note that now, you need not call AP_Close() anymore (even if
you can still call it but for nothing (it is automatically called on
exit()). But do not use Exit() to abort your program, I think it'll
crash if APurify is running. If you must use Exit() then call
AP_Close() just before calling Exit(). The explantion is simple: since
some system functions are patched, if a program exits without closing
the library, those patch will be corruped, pointing to a code that is
nomore in memory and you'll meet the guru (ie: the computer will
crash)... (You've been warned :-).
If you forget to open the library, a warning message will tell you
about that and the program will go just as if it wasn't processed by
APurify.
You can disable/enable printing of messages by making a call to
AP_Report(flag). If flag is true (ie. different from zero) then
printing is enabled, if it is false (ie. equal to zero), no output will
be done. This is usefull for startup-codes. For example, if you are
using the argv[] array in C, APurify will make a lot of false-error
printing. This is because the values pointed by this array is allocated
before the library is opened. You can avoid this by calling
AP_Report(0) before, and AP_Report(1) after, the code that uses argv[].
When debugging an APurify'ed program, you can put a breakpoint on
a function called AP_Err(). That function AP_Err() is called each time
APurify detects an error. With that, you'll have the occasion to look
at your program just before a faultly memory-access occur.
You can switch from a verbose output to a shorter one with
AP_Verbose(flag). IF flag is true then the verbose mode is on. If it is
false then only short messages will be printed. Some people prefer the
later so that is the default. If you perfer the verbose ouput then put
AP_Verbose(1) someware in your code and you'll get some longer
explanations about illegal accesses.
You can specify a logfile where APurify can put its errors. To do
this, set the environment variable "APlog" (file env:APlog) to a name
of a logfile. If this variable is set, then APurify will append all its
outputs to the file indicated.
You can use APurify on any language that generates a temporary
assembly language sourcefile (included assembly itself :-) ). You must
notice too, that you can use it on programs for which no source-code is
available (or .o files without .asm files). For that, use a program
that can do reverse engineering on your executable (ie: that
disassembles the executable and produces a .asm file ready to be
assembled). Then, with minor changes (prepend '_' and append ':' to
every interesting labels, put a call to AP_Init in the right place),
you get a file ready to be processed by APurify. If the processed file
has a HYNK_SYMBOL then you are very lucky and you need not work on
labels. You then just have to find the "_main:" and add "jbsr _AP_Init"
as the first instruction of the "_main:" subroutine.
Note: you can use ADIS (by Martin Apel) on aminet to do reverse
engineering (it seems to be quite good a tool to do it).
EXAMPLE:
-------
As an example, let's look at the test.dice program. You'll see how
you can use the APurify report it produces to find what's wrong in
the program. For this, I've included in that document the commented
report. My comments/explanations appear on lines beginning with a "#".
**** APurify started on Mon Sep 04 23:13:56 1995
#
# Well, the report started...
#
**** APURIFY ERROR ! [$00251900(4) R-- (_main, PC=$0025c36a HUNK=$0
OFFSET=$232)] accessed between:
-29 [$00251920(23) RW- (_main*)]
+44861 [$002469b8(12) RW- (_main*)]
# Hum... First hit... it is an error in reading something in the main()
# procedure between two blocks already allocated. The nearest block
# appears in first position, so we can think that the error was done by
# accessing an array allocated in main() with a negative index. We can
# look at the code to find what is wrong with it. Using DOBJ, we found
# at offset $232 in the first hunk the following code:
#
# 00.00000232 4852 PEA.L (A2)
# 00.00000234 4eb9 AP_WriteL JSR AP_WriteL
# 00.0000023a 24ab ffd8 MOVE.L -40(A3),(A2)
#
# The pointed instruction is a PEA followed by a JSR. So the
# interesting instruction is the third one. This corresponds to the C
# code:
#
# a[0]=b[-10]
#
# Hence we've discovered a first error in the code. Note that -25 is
# the distance (in bytes) between the end of the accessed memory and
# the beginning of the array. This is not the difference between the
# beginning address of the two blocks!
#
**** APURIFY ERROR ! [$002469c4(4) R-- (_main, PC=$0025c39a HUNK=$0
OFFSET=$262)] accessed between:
+1 [$002469b8(12) RW- (_main*)]
-44937 [$00251950(776) RWS (segment Module CLI)]
#
# Well... here it seems to be an access just after an allocated block.
# the offset +1 is the distance in bytes between the accessed block and
# a allocated block. The situation is like this:
#
# ---------[ 1 ]( 2 )---------->
#
# Where "[ 1 ]" is the allocated block and "( 2 )" the accessed block.
# If we look in the code, we find:
#
# 00.00000262 4aaa 0004 TST.L 4(A2)
#
# that correponds to the test done by "if(a[1] == 0)". This is an error
# since the array 'a' is just 16-12=4 bytes long. So a[1] points out of
# the array!
#
**** APURIFY ERROR ! [$002469c2(4) R-- (_read_shifted, PC=$0025c282
HUNK=$0 OFFSET=$14a)] accessed across the ending boundary of:
-2 [$002469b8(12) RW- (_main*)]
#
# Hehe another error... Damn ! That test program is FULL of bug !
# Yes, but that one is an other kind of error. It is an access across a
# boundary. That occur in the read_shifted() code. We need not look in
# the asm file to see the error. Here it is a misaligment error.
# Visually that gives:
#
# ------------[ 1(]2 )----------->
#
# [ 1 ] = allocated ( 2 ) = accessed.
#
**** APURIFY ERROR ! [$002469c0(4) R-- (_read_long, PC=$0025c29e
HUNK=$0 OFFSET=$166)] accessed between:
-44941 [$00251950(776) RWS (segment Module CLI)]
+179933 [$00219e64(3200) RWS (standard stack frame of task)]
#
# That error is strange! It is not an access to an array with a
# negative index as one think immediately: We never call read_long() in
# such a way. Indeed, the accessed memory was right some times ago
# since it lays in the array 'a' (look at the second hit). Hence, it
# must be an access to a freed memory. That error is then obviously
# found in the code:
#
# free_arg(a); read_long(a).
# ^^^^^^^^^^^^
# NOTE: You can see that the program ran with a stack of 3200 bytes.
#
**** APURIFY ERROR ! [$00000004(4) R-- (_read_page_zero, PC=$0025c2de
HUNK=$0 OFFSET=$1a6)] accessed on a read-protected block:
+4 [$00000000(1024) --S (Basic 680x0 vectors)]
#
# Here the error is obvious, were are reading the zero-page. If it was
# in writing, that error would be very dangerous.
#
**** APURIFY WARNING ! Closing library without deallocation of
the following block(s):
- [$00252060(408) RW- (_main*)]
- [$002635e8(12008) RW- (_main*)]
- [$002664d0(40008) RW- (_main*)]
#
# The program has exit()ed. APurify tells us that we've forget to free
# those blocks. It is a case of memory leak. Those blocks were
# allocated in main(). They appear in order of allocation. Those were
# allocated and lost by
#
# a=malloc(4),malloc(400),malloc(12000),malloc(400000)
#
# since the ",,," returns the leftmost value.
#
**** APurify ended on Mon Sep 04 23:13:59 1995
#
# Well... done :-).
#
NOTE: I hope this example is clear enough.. but I'm not sure.. tell me
:^).
LEGAL PART:
----------
That program is provided 'AS IS'. I am not responsible for any
dammage it can cause (but I am responsible for the benefits it can give
to you :-). Use that software at you own risks.
That program is FREEWARE. You can use and distribute it as long as
you keep the archive intact (no adulteration of files except for
compression). It can't be sold without my agreement (except a minimal
amount for media support). You must ask me for commercial use of (any
part of) that product. I keep all my rights on that program and its
future releases. I can modify that software without telling it to the
users.
If you wish, you can send me a postcard or anything else you want
(money, documentation, amiga, hardware stuff, ...) in exchange for
using APurify. But there is no obligation :-). My postal address is:
M. DEVULDER Samuel
1, Rue du chateau
59380 STEENE
FRANCE
(yes I'm french !). You can send suggestions or bugs to my email
address:
devulder@info.unicaen.fr
NOTES:
-----
My configuration is: one old A500 (1989), 2Mo RAM, 1 diskdrive, 1
HARD_DRIVE [300Mo, 10% full :-)], KS1.3 and a lot of patience (ah, I
wish I had an A4000/040/33Mhz that does not meet the guru all the
time !).
It has been compiled with freedice 2.06.37.
I had the idea of that program after a chat with Cedric BEUST
(AMIGA NEWS) on IRC (Internet Relay Chat). Thanks Cedric !
All marks are proprietary of their respective owners.
There are some programs like APurify. For example, FORTIFY (Simon
P. Bullen), but it only detects illegal writes to boundaries of
allocated blocks. Thus it can't detect big oversteps and oversteps in
reading and the detection is not real-time. Enforcer can detect illegal
access to memory (I think), but it needs a special device (MMU).
HINTS & TIPS:
------------
You can see some memory leaks with that version of APurify. It is
not really good but it can help. Memory leak occur when a block of
memory is nomore pointed by your program. Those memory blocks will
necessary be displayed when your program exit()s. So with all the
messages printed on that occasion, you can find such blocks. I known
this is not so great, but I think it can help you a little bit (maybe
in a future version I'll build some code to really check memory leaks).
BUGS:
----
APurify don't known public memory where a program can read or write
without having allocated it. Thus, it will report an error when a
program reads or writes values in a message obtained through GetMsg()
calls. Use AP_Report() to avoid such reports.
It can display messages about closing the library without freeing
some memory blocks. This is due to printf() that allocates memory that
is free'd on exit. This is not a real bug, but you can avoid this by
doing a AP_Report(0) just before exiting. But you must notice that it
is better to display false bugs than to not display real ones.
Certainly more bugs, but I'm waiting for your bug-reports.